Skip to content

[SPARK-10543] [CORE] Peak Execution Memory Quantile should be Per-task Basis#8726

Closed
saurfang wants to merge 2 commits into
apache:masterfrom
saurfang:stagepage
Closed

[SPARK-10543] [CORE] Peak Execution Memory Quantile should be Per-task Basis#8726
saurfang wants to merge 2 commits into
apache:masterfrom
saurfang:stagepage

Conversation

@saurfang

Copy link
Copy Markdown
Contributor

Read PEAK_EXECUTION_MEMORY using update to get per task partial value instead of cumulative value.

I tested with this workload:

val size = 1000
val repetitions = 10
val data = sc.parallelize(1 to size, 5).map(x => (util.Random.nextInt(size / repetitions),util.Random.nextDouble)).toDF("key", "value")
val res = data.toDF.groupBy("key").agg(sum("value")).count

Before:
image

After:
image

Tasks view:
image

cc @andrewor14 I appreciate if you can give feedback on this since I think you introduced display of this metric.

@andrewor14

Copy link
Copy Markdown
Contributor

add to whitelist

@andrewor14

Copy link
Copy Markdown
Contributor

yeah, I think this is correct. Would you mind adding a unit test for it? If you prefer to do it separately we can also just merge this first.

@saurfang

Copy link
Copy Markdown
Contributor Author

I added a naive unit test. Let me know if you think it's sufficient or clear.

@SparkQA

SparkQA commented Sep 12, 2015

Copy link
Copy Markdown

Test build #42364 has finished for PR 8726 at commit 3adbc33.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA

SparkQA commented Sep 12, 2015

Copy link
Copy Markdown

Test build #42365 has finished for PR 8726 at commit a379115.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@andrewor14

Copy link
Copy Markdown
Contributor

Yup it looks good, thanks for fixing this I'm merging into master and 1.5.

asfgit pushed a commit that referenced this pull request Sep 14, 2015
…k Basis

Read `PEAK_EXECUTION_MEMORY` using `update` to get per task partial value instead of cumulative value.

I tested with this workload:

```scala
val size = 1000
val repetitions = 10
val data = sc.parallelize(1 to size, 5).map(x => (util.Random.nextInt(size / repetitions),util.Random.nextDouble)).toDF("key", "value")
val res = data.toDF.groupBy("key").agg(sum("value")).count
```

Before:
![image](https://cloud.githubusercontent.com/assets/4317392/9828197/07dd6874-58b8-11e5-9bd9-6ba927c38b26.png)

After:
![image](https://cloud.githubusercontent.com/assets/4317392/9828151/a5ddff30-58b7-11e5-8d31-eda5dc4eae79.png)

Tasks view:
![image](https://cloud.githubusercontent.com/assets/4317392/9828199/17dc2b84-58b8-11e5-92a8-be89ce4d29d1.png)

cc andrewor14 I appreciate if you can give feedback on this since I think you introduced display of this metric.

Author: Forest Fang <forest.fang@outlook.com>

Closes #8726 from saurfang/stagepage.

(cherry picked from commit fd1e8cd)
Signed-off-by: Andrew Or <andrew@databricks.com>
@asfgit asfgit closed this in fd1e8cd Sep 14, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants